DepthToSpace
=================
将输入张量的深度通道按 block_size 分解并重排到空间维度（Depth -> Space）。

    输入：
        - **input** - 输入数据地址。
        - **in_shape** - 输入形状，格式为 ``[batch, height, width, channel]``。
        - **block_size** - block 因子（单个整数）。
        - **data_size** - 单个元素字节数（例如 sizeof(float)）。
        - **core_mask(int, 可选)** - 核掩码（仅适用于共享存储版本）。

    输出：
        - **output** - 输出数据地址。

    支持平台：
        ``FT78NE``
        ``MT7004``

    .. note::
        - FT78NE 支持的数据类型： fp32、fp64、cplx64、cplx128、int16、int8、int32。
        - MT7004 支持的数据类型： fp32、fp16、cplx64、int16、int32。


**共享存储版本:**

.. c:function:: void i8_depthtospace_s(int8_t *input, int8_t *output, const int *in_shape, int block_size, int data_size, int core_mask)
.. c:function:: void i16_depthtospace_s(int16_t *input, int16_t *output, const int *in_shape, int block_size, int data_size, int core_mask)
.. c:function:: void i32_depthtospace_s(int32_t *input, int32_t *output, const int *in_shape, int block_size, int data_size, int core_mask)
.. c:function:: void hp_depthtospace_s(half *input, half *output, const int *in_shape, int block_size, int data_size, int core_mask)
.. c:function:: void fp_depthtospace_s(float *input, float *output, const int *in_shape, int block_size, int data_size, int core_mask)
.. c:function:: void dp_depthtospace_s(double *input, double *output, const int *in_shape, int block_size, int data_size, int core_mask)
.. c:function:: void c64_depthtospace_s(float *input, float *output, const int *in_shape, int block_size, int data_size, int core_mask)
.. c:function:: void c128_depthtospace_s(double *input, double *output, const int *in_shape, int block_size, int data_size, int core_mask)

    **C 调用示例：**

    .. code-block:: c
        :linenos:
        :emphasize-lines: 10

        // FT78NE 多核示例
        #include <stdio.h>

        int main(int argc, char *argv[]) {
            float *input = (float *)0xA0000000;   // 多核版本：输入放在 DDR 地址 0xA0000000
            float *output = (float *)0xB0000000;  // 多核版本：输出放在 DDR 地址 0xB0000000
            int in_shape[4] = {10, 16, 16, 4};
            int block_size = 2;
            int core_mask = 0xff;
            fp_depthtospace_s(input, output, in_shape, block_size, sizeof(float), core_mask);
            return 0;
        }


**私有存储版本:**

.. c:function:: void i8_depthtospace_p(int8_t *input, int8_t *output, const int *in_shape, int block_size, int data_size)
.. c:function:: void i16_depthtospace_p(int16_t *input, int16_t *output, const int *in_shape, int block_size, int data_size)
.. c:function:: void i32_depthtospace_p(int32_t *input, int32_t *output, const int *in_shape, int block_size, int data_size)
.. c:function:: void hp_depthtospace_p(half *input, half *output, const int *in_shape, int block_size, int data_size)
.. c:function:: void fp_depthtospace_p(float *input, float *output, const int *in_shape, int block_size, int data_size)
.. c:function:: void dp_depthtospace_p(double *input, double *output, const int *in_shape, int block_size, int data_size)
.. c:function:: void c64_depthtospace_p(float *input, float *output, const int *in_shape, int block_size, int data_size)
.. c:function:: void c128_depthtospace_p(double *input, double *output, const int *in_shape, int block_size, int data_size)

    **C 调用示例：**

    .. code-block:: c
        :linenos:
        :emphasize-lines: 9

        // FT78NE 单核示例
        #include <stdio.h>

        int main(int argc, char *argv[]) {
            float *input = (float *)0x10000000;   // 单核版本：输入放在 L2 地址 0x10000000
            float *output = (float *)0x10040000;  // 单核版本：输出放在 L2 地址 0x10040000
            int in_shape[4] = {10, 16, 16, 4};
            int block_size = 2;
            fp_depthtospace_p(input, output, in_shape, block_size, sizeof(float));
            return 0;
        }